We propose an approach for semantic imitation, which uses demonstrations from a source domain, e.g. human videos, to accelerate reinforcement learning (RL) in a different target domain, e.g. a robotic manipulator in a simulated kitchen. Instead of imitating low-level actions like joint velocities, our approach imitates the sequence of demonstrated semantic skills like "opening the microwave" or "turning on the stove". This allows us to transfer demonstrations across environments (e.g. real-world to simulated kitchen) and agent embodiments (e.g. bimanual human demonstration to robotic arm). We evaluate on three challenging cross-domain learning problems and match the performance of demonstration-accelerated RL approaches that require in-domain demonstrations. In a simulated kitchen environment, our approach learns long-horizon robot manipulation tasks, using less than 3 minutes of human video demonstrations from a real-world kitchen. This enables scaling robot learning via the reuse of demonstrations, e.g. collected as human videos, for learning in any number of target domains.
translated by 谷歌翻译
We formulate grasp learning as a neural field and present Neural Grasp Distance Fields (NGDF). Here, the input is a 6D pose of a robot end effector and output is a distance to a continuous manifold of valid grasps for an object. In contrast to current approaches that predict a set of discrete candidate grasps, the distance-based NGDF representation is easily interpreted as a cost, and minimizing this cost produces a successful grasp pose. This grasp distance cost can be incorporated directly into a trajectory optimizer for joint optimization with other costs such as trajectory smoothness and collision avoidance. During optimization, as the various costs are balanced and minimized, the grasp target is allowed to smoothly vary, as the learned grasp field is continuous. In simulation benchmarks with a Franka arm, we find that joint grasping and planning with NGDF outperforms baselines by 63% execution success while generalizing to unseen query poses and unseen object shapes. Project page: https://sites.google.com/view/neural-grasp-distance-fields.
translated by 谷歌翻译
我们介绍了栖息地2.0(H2.0),这是一个模拟平台,用于培训交互式3D环境和复杂物理的场景中的虚拟机器人。我们为体现的AI堆栈 - 数据,仿真和基准任务做出了全面的贡献。具体来说,我们提出:(i)复制:一个由艺术家的,带注释的,可重新配置的3D公寓(匹配真实空间)与铰接对象(例如可以打开/关闭的橱柜和抽屉); (ii)H2.0:一个高性能物理学的3D模拟器,其速度超过8-GPU节点上的每秒25,000个模拟步骤(实时850x实时),代表先前工作的100倍加速;和(iii)家庭助理基准(HAB):一套辅助机器人(整理房屋,准备杂货,设置餐桌)的一套常见任务,以测试一系列移动操作功能。这些大规模的工程贡献使我们能够系统地比较长期结构化任务中的大规模加固学习(RL)和经典的感官平面操作(SPA)管道,并重点是对新对象,容器和布局的概括。 。我们发现(1)与层次结构相比,(1)平面RL政策在HAB上挣扎; (2)具有独立技能的层次结构遭受“交接问题”的困扰,(3)水疗管道比RL政策更脆。
translated by 谷歌翻译
Robots need to be able to adapt to unexpected changes in the environment such that they can autonomously succeed in their tasks. However, hand-designing feedback models for adaptation is tedious, if at all possible, making data-driven methods a promising alternative. In this paper we introduce a full framework for learning feedback models for reactive motion planning. Our pipeline starts by segmenting demonstrations of a complete task into motion primitives via a semi-automated segmentation algorithm. Then, given additional demonstrations of successful adaptation behaviors, we learn initial feedback models through learning from demonstrations. In the final phase, a sample-efficient reinforcement learning algorithm fine-tunes these feedback models for novel task settings through few real system interactions. We evaluate our approach on a real anthropomorphic robot in learning a tactile feedback task.
translated by 谷歌翻译
Estimating the 6D pose of objects is one of the major fields in 3D computer vision. Since the promising outcomes from instance-level pose estimation, the research trends are heading towards category-level pose estimation for more practical application scenarios. However, unlike well-established instance-level pose datasets, available category-level datasets lack annotation quality and provided pose quantity. We propose the new category level 6D pose dataset HouseCat6D featuring 1) Multi-modality of Polarimetric RGB+P and Depth, 2) Highly diverse 194 objects of 10 household object categories including 2 photometrically challenging categories, 3) High-quality pose annotation with an error range of only 1.35 mm to 1.74 mm, 4) 41 large scale scenes with extensive viewpoint coverage, 5) Checkerboard-free environment throughout the entire scene. We also provide benchmark results of state-of-the-art category-level pose estimation networks.
translated by 谷歌翻译
Automatic segmentation is essential for the brain tumor diagnosis, disease prognosis, and follow-up therapy of patients with gliomas. Still, accurate detection of gliomas and their sub-regions in multimodal MRI is very challenging due to the variety of scanners and imaging protocols. Over the last years, the BraTS Challenge has provided a large number of multi-institutional MRI scans as a benchmark for glioma segmentation algorithms. This paper describes our contribution to the BraTS 2022 Continuous Evaluation challenge. We propose a new ensemble of multiple deep learning frameworks namely, DeepSeg, nnU-Net, and DeepSCAN for automatic glioma boundaries detection in pre-operative MRI. It is worth noting that our ensemble models took first place in the final evaluation on the BraTS testing dataset with Dice scores of 0.9294, 0.8788, and 0.8803, and Hausdorf distance of 5.23, 13.54, and 12.05, for the whole tumor, tumor core, and enhancing tumor, respectively. Furthermore, the proposed ensemble method ranked first in the final ranking on another unseen test dataset, namely Sub-Saharan Africa dataset, achieving mean Dice scores of 0.9737, 0.9593, and 0.9022, and HD95 of 2.66, 1.72, 3.32 for the whole tumor, tumor core, and enhancing tumor, respectively. The docker image for the winning submission is publicly available at (https://hub.docker.com/r/razeineldin/camed22).
translated by 谷歌翻译
With the rise of AI and automation, moral decisions are being put into the hands of algorithms that were formerly the preserve of humans. In autonomous driving, a variety of such decisions with ethical implications are made by algorithms for behavior and trajectory planning. Therefore, we present an ethical trajectory planning algorithm with a framework that aims at a fair distribution of risk among road users. Our implementation incorporates a combination of five essential ethical principles: minimization of the overall risk, priority for the worst-off, equal treatment of people, responsibility, and maximum acceptable risk. To the best of the authors' knowledge, this is the first ethical algorithm for trajectory planning of autonomous vehicles in line with the 20 recommendations from the EU Commission expert group and with general applicability to various traffic situations. We showcase the ethical behavior of our algorithm in selected scenarios and provide an empirical analysis of the ethical principles in 2000 scenarios. The code used in this research is available as open-source software.
translated by 谷歌翻译
Incorporating computed tomography (CT) reconstruction operators into differentiable pipelines has proven beneficial in many applications. Such approaches usually focus on the projection data and keep the acquisition geometry fixed. However, precise knowledge of the acquisition geometry is essential for high quality reconstruction results. In this paper, the differentiable formulation of fan-beam CT reconstruction is extended to the acquisition geometry. This allows to propagate gradient information from a loss function on the reconstructed image into the geometry parameters. As a proof-of-concept experiment, this idea is applied to rigid motion compensation. The cost function is parameterized by a trained neural network which regresses an image quality metric from the motion affected reconstruction alone. Using the proposed method, we are the first to optimize such an autofocus-inspired algorithm based on analytical gradients. The algorithm achieves a reduction in MSE by 35.5 % and an improvement in SSIM by 12.6 % over the motion affected reconstruction. Next to motion compensation, we see further use cases of our differentiable method for scanner calibration or hybrid techniques employing deep models.
translated by 谷歌翻译
自我监督模型在机器学习(ML)中越来越普遍,因为它们减少了对昂贵标签数据的需求。由于它们在下游应用程序中的多功能性,它们越来越多地用作通过公共API暴露的服务。同时,由于它们输出的向量表示的高维度,这些编码器模型特别容易受到模型窃取攻击的影响。然而,编码器仍然没有防御:窃取攻击的现有缓解策略集中在监督学习上。我们介绍了一个新的数据集推理防御,该防御使用受害者编码器模型的私人培训集将其所有权归因于窃取的情况。直觉是,如果受害者从受害者那里窃取了编码器的培训数据,则在受害者的培训数据上,编码器的输出表示的对数可能比测试数据更高,但如果对其进行了独立培训,则不会。我们使用密度估计模型来计算该对数可能性。作为我们评估的一部分,我们还建议测量被盗编码器的保真度并量化盗窃检测的有效性,而无需涉及下游任务;相反,我们利用相互信息和距离测量值。我们在视觉领域中广泛的经验结果表明,数据集推断是捍卫自我监督模型免受模型窃取的有前途的方向。
translated by 谷歌翻译
从不同的随机初始化开始,经过随机梯度下降(SGD)训练的神经网络通常在功能上非常相似,从而提出了一个问题,即不同的SGD溶液之间是否存在有意义的差异。 Entezari等。最近猜想,尽管初始化不同,但在考虑到神经网络的置换不变性后,SGD发现的解决方案位于相同的损失谷中。具体而言,他们假设可以将SGD找到的任何两种解决方案排列,以使其参数之间的线性插值形成一条路径,而不会显着增加损失。在这里,我们使用一种简单但功能强大的算法来找到这样的排列,使我们能够获得直接的经验证据,证明该假设在完全连接的网络中是正确的。引人注目的是,我们发现在初始化时已经存在两个网络,并且平均它们随机,但适当排列的初始化的性能大大高于机会。相反,对于卷积架构,我们的证据表明该假设不存在。特别是在大型学习率制度中,SGD似乎发现了各种模式。
translated by 谷歌翻译